Generalized substring selectivity estimation
نویسندگان
چکیده
منابع مشابه
Generalized substring selectivity estimation
In a variety of settings from relational databases to LDAP to Web applications, there is an increasing need to quickly and accurately estimate the count of tuples (LDAP entries, Web documents, etc.) matching Boolean substring queries. In providing such selectivity estimates, the correlation between different occurrences of substrings is crucial. Selectivity estimation for generalized Boolean qu...
متن کاملMulti-Dimensional Substring Selectivity Estimation
With the explosion of the Internet, LDAP directories and XML, there is an ever greater need to evaluate queries involving (sub)string matching. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the dimensions. EEective query optimization in this context requires good selectivity estimates. In this paper, we use multi-dimensional count-suux trees as t...
متن کاملGeneralized Substring Compression
In substring compression one is given a text to preprocess so that, upon request, a compressed substring is returned. Generalized substring compression is the same with the following twist. The queries contain an additional context substring (or a collection of context substrings) and the answers are the substring in compressed format, where the context substring is used to make the compression...
متن کاملGeneralized closest substring encryption
We propose a new cryptographic notion called generalized closest substring encryption. In this notion, a ciphertext encrypted with a string S can be decrypted with a private key of another string S′, if there exist a substring of S, i.e. Ŝ, and a substring of S′, i.e. Ŝ′, that are “close” to each other measured by their “overlap distance”. The overlap distance between Ŝ and Ŝ′ is the number of ...
متن کاملSubstring Count Estimation in Extremely Long Strings
To estimate the number of substring matches against string data, count suffix trees (CS-tree) have been used as a kind of alphanumeric histograms. Although the trees are useful for substring count estimation in short data strings (e.g. name or title), they reveal several drawbacks when the target is changed to extremely long strings. First, it becomes too hard or at least slow to build CS-trees...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Computer and System Sciences
سال: 2003
ISSN: 0022-0000
DOI: 10.1016/s0022-0000(02)00031-4